-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal for EC2 instance selector integration and dry-run #3107
Conversation
645613c
to
ff58cbe
Compare
Lovely write up @cPu1. Just to ensure I'm understanding:
We compute
Question:
If this is the case then why do we add the complexity of this being part of the
I don't quite see the value in having this as part of create cluster/nodegroup Q2:
|
Really thorough @cPu1 🎉 Furthering @aclevername's comment:
In this case, we would be better off not providing anything they can just run the instance selector themselves. IIUC, the value to adding this to eksctl is so that users do not need to think about it. If they want to disagree with the Selector, it is probably the exact same effort to just run the selector themselves than to do a 2 step process through us. |
Users can do this by omitting the
Such a command would add little value because it'd be a mere wrapper around the |
👍
I don't understand the value it being printed out |
The proposed |
I think as a feature aside from
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. Some comments on the overview for this feature as well as questions around how we represent. No major blockers in the current design.
|
||
## Motivation | ||
|
||
With support for multiple instance types and Spot instances for EKS Managed Nodegroups, users can create Spot instances where EKS will configure and launch an Autoscaling Group of Spot instances following Spot best practices. Users can set `managedNodeGroup.spot` and supply a list of instance types in `managedNodeGroup.instanceTypes` to instruct eksctl to create a managed nodegroup using Spot instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very focused on spot. Can we change the motivation to focus on the general case (that of course includes spot) so that users are not confused?
|
||
With support for multiple instance types and Spot instances for EKS Managed Nodegroups, users can create Spot instances where EKS will configure and launch an Autoscaling Group of Spot instances following Spot best practices. Users can set `managedNodeGroup.spot` and supply a list of instance types in `managedNodeGroup.instanceTypes` to instruct eksctl to create a managed nodegroup using Spot instances. | ||
|
||
It is a Spot best practice to use multiple instance types, but with over 270 EC2 instance types, users have to spend time figuring out which instance types would be well suited for their nodegroup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same call out here - this is an issue in general, not just for those using Spot.
The [EC2 Instance Selector](https://github.com/aws/amazon-ec2-instance-selector) tool addresses this problem by generating a list of instance types based on resource criteria such as vCPUs, memory, etc. This proposal aims to integrate the EC2 instance selector with eksctl to enable creating nodegroups with multiple instance types by passing the resource criteria. | ||
|
||
### Problem: a simple integration | ||
We could add the instance selector options—as CLI options (`--instance-selector-vcpus` and `--instance-selector-memory`) and in the config file—to `create cluster` and `create nodegroup` that would select the instance types matching the resource criteria for the user and proceed to creating the nodegroup. However, this gives users no control over selecting the instance types. Users may want to inspect and change the instances matching the instance selector criteria before proceeding to creating a nodegroup, and also to document the instance types being used for their nodegroups in the ClusterConfig file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a strawman against a certain option. Can we be more clear about the problem we want to solve here?
In particular, is it critical that we have a "human in the loop" here? We could always modify the config file with the selections and output them. Is there benefit to having the manual selection process? Doesn't this require the user to have an understanding of the instance type options anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading further - it looks like this is actually the proposal - but we negate it early to introduce the --dry-run
functionality. Let's consider cutting or revising this section as its confusing.
|
||
## Proposal | ||
|
||
This design proposes adding a new general-purpose `--dry-run` option to `eksctl create cluster` and `eksctl create nodegroup` that skips cluster and nodegroup creation and instead outputs a ClusterConfig file representing the CLI options passed. This functionality extends to the instance selector integration where eksctl will set the instance types for a nodegroup when the instance selector options are supplied. This gives users the opportunity to inspect and, if required, change the instance types before proceeding with cluster and nodegroup creation. Users can omit the `--dry-run` option if they do not care about the instance types selected by eksctl. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--dry-run
is one aspect of the feature. Shouldn't we lead with the new --instance-selector-[ ]
functionality?
Along with the `--dry-run` option, two new options `--instance-selector-vcpus` and `--instance-selector-memory` will be added to `eksctl create cluster` and `eksctl create nodegroup`. These options will also be supported in the ClusterConfig file for both managed and self-managed nodegroups with the following schema: | ||
|
||
```yaml | ||
instanceSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are vCPU and memory the only inputs we will accept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd start off by adding vCPU and memory, and add more options based on user feedback. Otherwise we'd have to add a CLI option instance-selector-*
for each instance selector option. Are there other options that you think we should add in the initial implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU would be an important one
is --instance-selector
too verbose?
|
||
### eksctl create cluster | ||
|
||
When `eksctl create cluster` is called with the instance selector options and `--dry-run`, eksctl will output a ClusterConfig file containing a nodegroup representing the CLI options and the instance types set to the instances matched by the instance selector resource criteria. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should ensure --dry-run
outputs all options, not just node groups!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update looks good!
- name: <name> | ||
instancesDistribution: | ||
instanceTypes: ["c5.large", "c5a.large", "c5ad.large", "c5d.large", "t2.medium", "t3.medium", "t3a.medium"] | ||
instanceSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we automatically add a comment in the file "Ignored during execution" or something of the like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. We can have something like:
# Ignored during execution since `instanceTypes` is also set.
instanceSelector:
|
||
managedNodeGroups: | ||
- name: linux | ||
instanceSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it confusing that this is only in the MNG section? How do we handle merging MNG and non-MNG groups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. This field is also present in the nodeGroups
section below: https://github.com/weaveworks/eksctl/pull/3107/files#diff-af7ec3d70c271157e75ad67f9bb60c75c8596ace0bee45c3b6a83d7debd9408cR126
How do we handle merging MNG and non-MNG groups?
Can you expand on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you answered my question -must had missed it on the first pass.
- name: <name> | ||
spot: true | ||
instanceTypes: ["c3.large", "c4.large", "c5.large", "c5d.large", "c5n.large", "c5a.large"] | ||
instanceSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@cPu1 I'm still slightly confused as to why we need the |
ff58cbe
to
9e9a888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update LGTM 👍
We may want to add GPU as an instance selector option.
Also is --instance-select-x
too verbose? May not be but wanted to see if you had thoughts on other options.
While the instance selector feature helps select a set of instances matching the resource criteria, some users may want to inspect and change the instances selected by the instance selector before proceeding to creating a nodegroup, and also to document the instance types being used for their nodegroups in the ClusterConfig file. For this reason, this design also proposes adding a new general-purpose `--dry-run` option to `eksctl create cluster` and `eksctl create nodegroup` that skips cluster and nodegroup creation and instead outputs a ClusterConfig file containing the default values set by eksctl and representing the supplied CLI options. This functionality extends to the instance selector integration where eksctl will set the instance types for a nodegroup when the instance selector options are supplied. This gives users the opportunity to inspect and, if required, change the instance types before proceeding with cluster and nodegroup creation. Users can omit the `--dry-run` option if they do not care about the instance types selected by eksctl. | ||
|
||
There are certain one-off options that cannot be represented in the ClusterConfig file, e.g., `--install-vpc-controllers`. | ||
Users would expect `eksctl create cluster --<options...> --dry-run > config.yaml` followed by `eksctl create cluster -f config.yaml` to be equivalent to running the first command without `--dry-run`. eksctl would therefore disallow passing options that cannot be represented in the config file when `--dry-run` is passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Along with the `--dry-run` option, two new options `--instance-selector-vcpus` and `--instance-selector-memory` will be added to `eksctl create cluster` and `eksctl create nodegroup`. These options will also be supported in the ClusterConfig file for both managed and self-managed nodegroups with the following schema: | ||
|
||
```yaml | ||
instanceSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU would be an important one
is --instance-selector
too verbose?
|
||
### eksctl create cluster | ||
|
||
When `eksctl create cluster` is called with the instance selector options and `--dry-run`, eksctl will output a ClusterConfig file containing a nodegroup representing the CLI options and the instance types set to the instances matched by the instance selector resource criteria. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update looks good!
Noted, I'll add it to the proposal. To be clear, the
It may be too verbose, but the instance selector option names by themselves are not very clear in the context of eksctl, e.g., just |
ba29124
to
ead28cd
Compare
- [Proposal](#proposal) | ||
- [Dry Run](#dry-run) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be in two separate proposal files? It feels a bit hard to distinguish them atm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could be, but I'll have to reword some of the bits because right now it's written within the context of instance selector. I was planning to rename the title and filename to include dry-run, so I'll do that.
eb79c2b
to
aa60887
Compare
aa60887
to
f5bd1db
Compare
Description
Rendered
Closes #2998
Checklist
README.md
, or theuserdocs
directory)area/nodegroup
), target version (e.g.version/0.12.0
) and kind (e.g.kind/improvement
)BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯
Community Note: